4 research outputs found

    Fast, Accurate Processor Evaluation Through Heterogeneous, Sample-Based Benchmarking

    Get PDF
    Performance evaluation is a key task in computing and communication systems. Benchmarking is one of the most common techniques for evaluation purposes, where the performance of a set of representative applications is used to infer system responsiveness in a general usage scenario. Unfortunately, most benchmarking suites are limited to a reduced number of applications, and in some cases, rigid execution configurations. This makes it hard to extrapolate performance metrics for a general-purpose architecture, supposed to have a multi-year lifecycle, running dissimilar applications concurrently. The main culprit of this situation is that current benchmark-derived metrics lack generality, statistical soundness and fail to represent general-purpose environments. Previous attempts to overcome these limitations through random app mixes significantly increase computational cost (workload population shoots up), making the evaluation process barely affordable. To circumvent this problem, in this article we present a more elaborate performance evaluation methodology named BenchCast. Our proposal provides more representative performance metrics, but with a drastic reduction of computational cost, limiting app execution to a small and representative fraction marked through code annotation. Thanks to this labeling and making use of synchronization techniques, we generate heterogeneous workloads where every app runs simultaneously inside its Region Of Interest, making a few execution seconds highly representative of full application execution

    Encaminador de mensajes para redes de interconexión de sistemas multiprocesador.

    Get PDF
    Encaminador de mensajes para redes de interconexión de sistemas multiprocesador caracterizado por estar especialmente adaptado para el intercambio de información de forma adaptativa e independiente de la topología entre los elementos de proceso integrados en un solo chip. El encaminador resuelve importantes problemas técnicos que se presentan en la interconexión de un número elevado de procesadores en un único chip cuando actualmente únicamente se dispone de encaminadores especialmente diseñados para la interconexión de dispositivos localizados en chips separados. El encaminador se caracteriza por los siguientes elementos básicos: - Dos anillos concéntricos, cada uno esta formado por un grupo de buffers de doble puerto. - Un conjunto de etapas de entrada y de salida en número igual al grado del encaminador, a través de las cuales entran o salen los paquetes provenientes de los encaminadores vecinos. - Una etapa de inyección y consumo para su comunicación con el elemento de proceso asociado.Solicitud: 200701403 (10.05.2007)Nº Pub. de Solicitud: 2324577A1 (10.08.2009)Nº de Patente: 2324577B2 (01.02.2010

    Memory hierarchy characterization of NoSQL applications through full-system simulation

    Get PDF
    In this work, we conduct a detailed memory characterization of a representative set of modern data-management software (Cassandra, MongoDB, OrientDB and Redis) running an illustrative NoSQL benchmark suite (YCSB). These applications are widely popular NoSQL databases with different data models and features such as in-memory storage. We compare how these data-serving applications behave with respect to other well-known benchmarks, such as SPEC CPU2006, PARSEC and NAS Parallel Benchmark. The methodology employed for evaluation relies on state-of-the-art full-system simulation tools, such as gem5. This allows us to explore configurations unattainable using performance monitoring units in actual hardware, being able to characterize memory properties. The results obtained suggest that NoSQL application behavior is not dissimilar to conventional workloads. Therefore, some of the optimizations present in state-of-the-art hardware might have a direct benefit. Nevertheless, there are some common aspects that are distinctive of conventional benchmarks that might be sufficiently relevant to be considered in architectural design. Strikingly, we also found that most database engines, independently of aspects such as workload or database size, exhibit highly uniform behavior. Finally, we show that different data-base engines make highly distinctive demands on the memory hierarchy, some being more stringent than others.This work was supported in part by the Spanish Government (Secretarıa de Estado de Investigacion, Desarrollo e Innovacion) under Grants TIN2015-66979-R and TIN2016-80512-R

    Scalable memory hierarchy for chip multiprocessors

    No full text
    RESUMEN: Los multiprocesadores son un estándar de los sistemas actuales y suponen una solución a algunos de los limitantes tecnológicos encontrados. Sin embargo, no están exentos de condicionantes tecnológicos que limitan su efectividad. Así, aun cuando el incremento en el número de transistores integrados parece garantizar un aumento en el número de unidades de proceso y de memoria dentro del chip, las conexiones al exterior del chip son cada vez más escasas respecto al número de procesadores. Es necesario minimizar el número de accesos externos, incrementando la fracción del chip dedicada a la jerarquía de memoria y buscando mecanismos para una utilización más eficaz de los recursos disponibles. En esta tesis se abordan distintos componentes de la jerarquía de memoria, abarcando desde la jerarquía de cache on-chip y la red de interconexión, hasta el controlador de memoria y el arbitraje de las peticiones fuera del chip. Se intenta exponer, de forma clara, los problemas y soluciones encontrados en los distintos componentes de la jerarquía de memoria, siempre buscando alternativas eficientes que aumenten la escalabilidad dentro de los requerimientos propios de este tipo de sistemas.ABSTRACT: Multiprocessor systems represent an efficient solution to some of the technological problems encountered; however, they are not without technological constraints that limit their effectiveness. Thus, even if the increase in the number of integrated transistors seems to ensure an increment in the number of memory and processing units within the chip, the off-chip connections are becoming more and more scarce compared to the number of processors. It is necessary to minimize the number of external accesses, increasing the fraction of the chip devoted to the memory hierarchy and requiring mechanisms that provide effective use of available resources. In this thesis, we address different components of the memory hierarchy, ranging from the on-chip cache hierarchy and interconnection network, to the memory controller and the arbitration of off-chip requests. This document will attempt to clearly explain, problems and solutions found in various components of the memory hierarchy, always with the aim of finding efficient ways to increase the scalability while bearing in mind the specific requirements of such systems
    corecore